Automatic Identification of Phonetic Similarity Based on Underspecification
نویسندگان
چکیده
This paper presents a novel approach to the identification of phonetic similarity using properties observed during the speech recognition process. Experiments are presented whereby specific phones are removed during the training phase of a statistical speech recognition system so that the behaviour of the system can be analysed to see which alternative phone is selected. The domain of the analysis is restricted to specific contexts and the alternatively recognised (or substituted) phones are analysed with respect to a number of factors namely, the common phonetic properties, the phonetic neighbourhood and the frequency of occurrence with respect to a particular corpus. The results indicate that a measure of phonetic similarity based on alternatively recognised observed properties can be predicted based on a combination of these factors and as such can serve as an important additional source of information for the purposes of modelling pronunciation variation.
منابع مشابه
Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملString Similarity Measures and PAM-like Matrices for Cognate Identification
We present a new automatic learning system for the identification of cognates, words that derive from a common ancestor and share the same etymological origin. Our approach combines and adapts several techniques developed for biological sequence analysis to the natural language processing environment. We design a linguistic-inspired matrix to align sensibly our training dataset. We introduce a ...
متن کاملAgainst Underspecification in Speech Errors
This paper argues against the use of phonological underspecification in feature matrices on the basis of speech error data. Stemberger 1991 argues that phonological underspecification influences the similarity of phonemes. He claims underspecified features do not count toward similarity, based on an analysis of phoneme confusions in a naturally occurring speech error corpus. Using the same corp...
متن کاملAutomatic identification of confusable drug names
OBJECTIVE Many hundreds of drugs have names that either look or sound so much alike that doctors, nurses and pharmacists can get them confused, dispensing the wrong one in errors that can injure or even kill patients. METHODS AND MATERIAL We propose to address the problem through the application of two new methods-one based on orthographic similarity ("look-alike"), and the other based on pho...
متن کامل